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This  report  summarizes  the  research  performed  under  Contract  No.: 
N00014-8 3-C-0537  to  the  Office  of  Naval  Research,  entitled  "Human 
Factors  in  Rule-Based  Systems."  The  period  of  performance  of  this 
effort  was  August  15,  1985,  to  September  30,  1985.  As  indicated  in 
the  original  proposal,  this  research  effort  was  oriented  toward 
supporting  two  long-term  interrelated  goals  (1)  to  advance  a  general 
theory  of  the  cognitive  psychology  of  user  interactions  with 
rule-based  systems  and  (2)  to  recommmend ,  based  on  the  general  thoery, 
design  principles  for  the  user  engineering  of  expert  systems.  A 
general  discussion  of  the  work  performed  during  this  effort,  and  the 
results  therefrom,  is  presented  below. 


ta 
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BACKGROUND  AND  PROBLEM  SELECTION 


Most  military  applications  of  expert  system  technology  involve 
building  expert  systems  that  are,  from  a  psychological  person/machine 
interface  perspective,  very  different  from  traditional  systems,  such 
as  PROSPECTOR  and  MYCIN,  that  were  developed  in  laboratory  settings. 
In  particular,  as  documented  in  Lehner  (1984),  most  military 
applications  of  expert  system  technology  differ  from  the  traditional 
systems  in  that: 

(1)  The  traditional  systems  addressed  problem  domains  with 
a  well-established,-  well-documented,  and  static  knowledge 
base.  Military  applications  tend  to  involve  ill-specified 
knowledge  bases,  where  human  experts  differ  considerably  in 
their  opinions. 

(2)  In  the  traditional  systems,  it  was  sufficient  to  model 
the  system  after  one  good  human  expert.  In  many  military 
applications,  the  system  must  somehow  merge  the  expertise  of 
multiple  human  experts  with  the  differing  areas  of 
expertise . 

(3)  In  the  traditional  systems,  the  assumed  user  community 
was  not  very  diverse.  Users  of  medical  diagnosis  programs 
were  likely  to  have  some  type  of  medical  degree  (M.D.,  R.N., 
etc.).  Users  of  systems  such  as  PROSPECTOR  were  assumed  to 
be  people  with  a  significant  background  in  geology.  In  many 
military  applications,  on  the  other  hand,  the  level  and  type 
of  experience  and  training  of  users  will  vary  considerably. 

(4)  Finally,  the  traditional  systems  were  stand  alone.  The 
user  entered  all  problem  specific  data.  As  a  result,  it 
could  be  assumed  that  users  were  already  familiar  with  all 
data  available  to  solving  the  problem  at  hand.  Many 
military  applications,  on  the  other  hand,  require  that  the 
expert  system  be  embedded  within  a  larger  'background' 
system.  As  a  result,  it  must  be  assumed  that  users  will  not 
be,  a  priori,  familiar  with  the  specifics  of  the  problem 
being  addressed.  Indeed,  the  user  may  not  even  know  a 
problem  exists  until  the  expert  system  has  already  analyzed 
data,  obtained  from  the  background  system,  and  generated  its 
conclusions  and  recommendations. 


Given  (1)  through  (4)  above,  it  seemed  reasonable  to  characterize 
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the  general  user/expert  system  setting  as  a  situation  where  two  expert 
problem  solvers  are  trying  to  cooperatively  solve  a  common  decision 
problem  despite  the  fact  that  these  two  experts  may  use  different 
decision  processes,  heuristics  and  data  to  solve  the  common  problem. 
For  users  of  military  expert  systems,  these  differences  will  often  be 
very  pronounced.  This  is  not  a  very  encouraging  setting,  particularly 
if  one  accepts  the  conventional  wisdom  on  user/expert  system 
interaction  that  says  the  more  an  expert  system  mimics  a  user's 
problem  solving  style  and  heuristics,  the  better  user/system 

cooperative  problem  solving  performance  will  be.  Indeed,  systems  that 
are  inconsistent  with  the  user's  approach  tend  to  be  flatly  rejected 
by  users  (Clancey,  1984). 

The  problem  we  selected  to  address  was  to  discover  the 
conditions  under  which  user/expert  system  performance  would  remain 
high  despite  significant  differences  between  the  approaches  of  the  two 
problem  solvers.  This  naturally  leads  to  the  more  general  research 
issue  of  discovering,  in  general,  what  the  primary  drivers  were  of 
effective  user/expert  system  interaction.  Furthermore,  we  wanted  to 
focus  on  mediating  variables  that  could  easily  generalize  to  any 
user/machine  environment  that  involved  an  'intelligent'  machine. 
Finally,  our  professional  interests  lead  us  to  focus  on  cognitive 
issues  rather  than  on  perceptual/display  design  issues.  In  addition, 
it  was  felt  that  if  one  could  satisfactorally  address  the  cognitive 
issues,  then  it  should  be  possible  to  derive  a  number  of  specific 
implications  for  display  design. 

Given  the  above  orientation,  the  next  question  then  became  one  of 
identifying  the  cognitive  dimensions  that  need  to  be  considered.  A 


literature  review  suggested  two  basic  dimensions 


(1)  human  cognitions  about  the  problem  domain,  and 

(2)  human  cognitions  about  the  machine's  cognitions  about 
the  problem  domain. 

This  lead  us  to  postulate  that  two  of  the  key  drivers  of  the  nature  of 
a  user/expert  system  interactions  would  to  a  significant  extent  be 


(1)  the  degree  to  which  the  person's  and  machine's 
cognitions  about  a  problem  overlapped  (the 
cognitive  consistency  dimension) ,  and 

(2)  how  well  the  user  understood  the  machine's 
cognitions  about  a  problem  domain  even  when  they 
differed  significantly  from  the  user's 

(the  mental  model  dimension)  . 

It  was  further  postulated  that  if  the  user  had  a  good  mental  model  of 
how  the  machine  goes  about  solving  the  problem,  the  user  should  still 
be  able  to  effectively  interact  with  the  machine  even  if  the  machine 
solves  the  problem  in  a  manner  different  than  the  user. 

Of  course,  the  value  of  interacting  with  the  machine  will  depend 
on  exactly  how  the  user  and  machine's  cognitions  about  a  problem 
differ.  If  the  person  and  machine  come  up  with  different  conclusions, 
and  the  machine  has  access  to  relevant  data  the  user  didn't  know 
about,  then  interacting  with  the  machine  to  retrieve  that  data  is 
clearly  very  valuable.  On  the  other  hand,  if  the  person  and  machine 
generated  different  conclusions  because  the  person  and  machine  used 
different  heuristics,  but  the  same  data,  then  the  va''.ue  of  being  able 
to  'trace'  the  machines  logic  will  depend  on  how  well  the  user  can 
incorporate  the  machine's  cognitions  into  his  or  her  own  reasoning 
about  the  problem.  The  experiments  discussed  below  address  these 
issues 


SUMMARY  OP  EXPERIMENTS  PERFORMED 


The  experiments  performed  under  this  project  [see  Lehner  et  al . , 
(1984);  Lehner  &  Zirk,  (1985);  Hall,  (1985)]  were  oriented  toward 
testing  the  general  hypothesis  that  a  good  mental  model  of  an  expert 
system's  cognitions  would  lead  to  good  user/expert  system  performance 
even  when  the  user  had  very  different  cognitions  than  the  expert 
system  in  solving  the  problem.  Furthermore,  it  was  hypothesized  that 
when  the  user  did  not  have  a  good  mental  model  of  the  expert  system's 
processing,  the  conventional  wisdom,  suggesting  that  performance 
improves  as  the  overlap  between  the  person's  and  machine's  cognitions 
increase,  would  hold  true. 

The  traditional  procedure  for  the  first  three  experiments  used  a 
generic  expert  system  development  package  (PAR'S  ERS  software)  that  is 
similar  in  many  respects  to  the  classical  r^oSPECTOR  system.  In 
particular,  the  user  interface  of  this  .  /■  r  is  fairly  typical  of 
systems  such  as  PROSPECTOR.  Using  t^  .  expert  system  development 

package,  a  small  rule  base  was  built  for  selecting  from  among 
alternative  stocks  under  various  stock  market  conditions.  In  all 
three  experiments  subjects  were  split  into  two  different  types  of 
decision  processes  (based  on  the  procedures  they  were  taught  for 
solving  the  problem  manually):  a  goal-driven  process  that  was  similar 
to  the  stock  market  expert  system's,  and  a  data  driven  process  that 
was  very  different  than  the  stock  market  expert  system's  procedures. 
Both  processes,  if  properly  applied,  generated  the  same  answers.  In 
addition,  a  'good  mental  model'  and  a  'poor  mental  model'  condition 
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was  created.  The  good  mental  model  students  were  given  two  pages  of 
typed  text  (double  spaced)  that  explained  that  the  expert  system  used 
rules,  and  that  these  rules  could  be  conceptually  organized  into 
inference  networks.  In  the  poor  mental  model  condition  subjects 
received  only  a  short  general  descriptions  as  to  how  the  expert  system 
solved  a  stock  market  decision  problem. 

In  experiment  1,  both  the  subject  and  expert  system  had 
isomorphic  decision  rules  (i.e.,  they  would  come  up  with  the  same 
answers),  but  there  was  inconsistency  in  the  data  sets.  The  expert 
system  had  access  to  data  the  subjects  did  not  initially  have,  while 
the  data  the  subjects  did  have  was  somewhat  more  accurate  than  the 
expert  system's.  Under  this  condition,  subjects  needed  to  interact 
with  the  expert  system  to  get  all  relevant  data,  but  the  expert  system 
did  not  necessarily  generate  the  correct  answers. 

The  primary  results  for  experiment  1  are  shown  below.  The  cell 
values  indicate  the  percent  of  problems  users  answered  correctly. 

User ' s 

Decision  Process 

Consistent  with  Inconsistent  with 

expert  system  expert  system 


Quality 
of  User's 

Good 

58% 

83% 

Mental  Model 

Poor 

50% 

25% 

Clearly,  when  the  subjects  and  the  expert  system  used  similar  decision 
processes,  mental  model  had  little  impact.  On  the  other  hand,  when 
the  subjects  and  expert  system  employed  different  decision  processes, 
the  impact  of  a  mental  model  was  dramatic. 

Analyzing  the  results  of  the  first  experiment,  we  concluded  two 
things:  (1)  the  data-driven  procedure  was  easier  for  subjects  to 


employ  manually  than  the  goal  driven  procedu...,  and  (2)  th  primary 
driver  of  the  83%-25%  difference  between  the  good  and  poor  mental 
model  subjects  under  the  process  inconsistent  condition  was  that 
subjects  with  the  good  mental  model  condition  were  able  to  effectively 
manipulate  the  expert  system  to  gain  access  to  the  missing  data  while 
the  poor  mental  model  subjects  often  failed  to  extract  the  missing 
data  in  time  to  solve  the  problem.  The  poor  mental  model  subjects  did 
not  need  to  'manipulate'  the  expert  system  to  obtain  necessary  data. 

Experiment  2  empirically  tested  (2)  above.  In  this  experiment 
the  good  mental  model/ incons istent  and  poor  mental  model/inconsistent 
conditions  were  replicated  with  the  single  exception  that  subjects  in 
the  latter  condition  had  an  additional  command  that  would  give  them  an 
immediate  display  of  all  the  relevant  data  the  machine  had  available. 
The  primary  result  is  shown  below. 

User ' s 

Decision  Processes 

Inconsistent 
with  expert  system 


Qual i ty 
of  User's 

Good 

78% 

Mental  Model 

Poor 

69% 

We  felt  our  hypothesis  was  supported. 

Summarizing  these  two  experiments,  it  appears  that  a  having  a 
good  mental  model  allowed  users  to  be  effective  operators  of  the 
expert  system  even  when  the  user  and  expert  system  employed 
inconsistent  decision  processes.  As  a  result,  subjects  with  a  good 
mental  model  were  able  to  access  necessary  data,  while  subjects  with  a 
poor  mental  model  often  failed  to  do  so.  It  should  be  noted  however 
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that  in  these  experiments  subjects  had  little  need  to  actually  trace 
the  expert  system's  reasoning  to  get  assistance,  they  simply  needed  to 
find  a  sequence  of  commands  that  would  get  them  to  the  missing  data. 
Consequently,  it  was  not  clear  the  extent  to  which  a  good  mental  model 
helped  subjects  actually  understand  how  the  system  generated  a 
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recommendation . 

Experiment  3 

addressed  this 

latter  issue. 

In 

particular,  we 

wanted  to  see  the 

extent  to  which 

a  good  mental  model 

helped  subjects 

i 

to  isolate  'errors 

'  in  an  expert 

system  rule  set. 

In 

this  experiment 

the  four  cells  in 

experiment  1 

were  replicated 

with  the  following 

•y 

■>; 

changes : 

(1)  both  the  subject  and  expert  system  had  the  same  data. 


(2)  some  of  the  parameter  values  in  the  rules  were  modified, 
leading  to  erroneous  conclusions. 


(3)  for  each  problem,  subjects  were  given  the  correct  answer 

based  on  the  manual  procedures  they  were  taught  to  use,  and 


(4)  the  subjects  task  was  to  find  the  erroneous  parameter 
value (s)  and  rule(s)  in  the  expert  system. 


The  results  of  this  experiment  are  shown  below.  Cell  values 
indicate  percent  of  problems  where  subjects  isolated  the  erroneous 


rules 


User ' s 

Decision  Process 


Quality  of 
User's  Mental  Model 


Consistent  with 
expert  system 


Inconsistent 
with  expert  system 


Good 

Poor 


68% 

45% 


65% 

30% 


As  with  experiment  1,  cognitive  consistency  had  a  positive  impact 
only  when  subjects  had  a  poor  mental  model. 

Finally,  in  an  attempt  to  generalize  the  results  of  the  above 
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experiments,  a  fourth  experiment  was  performed  using  a  'real  world', 
complex  expert  system,  rather  than  the  artificial  stock  market  problem 
used  in  the  previous  experiments.  Specifically,  the  Stanford 
University  MYCIN  system  was  used  as  the  testbed.  Subjects  (third-  and 
fourth-year  medical  students)  were  given  a  summary  MYCIN 
recommendation  for  an  individual  test  case,  and  were  required  to 
exercise  MYCIN  to  determine  exactly  how  it  generated  its 
recommendation.  (See  Hall,  1985,  for  details.)  In  this  experiment, 
the  subjects  were  either  in  a  poor  mental  model  or  good  mental  model 
condition,  using  essentially  the  same  manipulation  of  mental  model 
used  in  the  previous  experiments.  The  primary  dependent  variable  was 
the  number  of  individual  MYCIN  rules  that  subjects  examined  before 
finding  the  specific,  high-level  rule  that  resulted  in  the  MYCIN 
recommendation . 

Preliminary  results  for  this  experiment  are  shown  on  the 
following  page.  Unfortunately,  because  of  limited  subject 
availability,  only  three  subjects  per  group  were  run  by  project 
termination.  Even  with  only  six  subjects  however,  a  t-test  comparison 
of  the  two  groups  was  'significant'  at  the  .1  level  (one-tailed  test). 
We  expect  to  collect  some  additional  data  in  the  near  future. 

OISCOSSION  AND  CONCLOSION 

The  basic  conclusions  for  these  experiments  appears  to  be  that  in 
a  cooperative  human/intelligent  machine  problem  solving  setting,  where 
the  human  and  machine  employ  different  problem  solving  procedures,  it 
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is  generally  essential  that  the  user  have  an  accurate  model  of  how 


that  machine  operates.  Even  for  relatively  simple  decision  problems, 
such  as  the  one  used  in  these  experiments,  a  poor  mental  model  leads 
to  anywhere  from  a  30%  to  60%  drop  in  performance.  For  military 
expert  system  applications  the  need  for  a  good  mental  model  may  be 
particularly  important.  As  previously  noted,  users  of  military  expert 
systems  are  likely  to  be  significantly  inconsistent  from  the  expert 
system  in  both  the  problem  specific  data  they  are  initially  aware  of 
and  the  domain  specific  heuristics  utilized  in  problem  solving.  The 
user/expert  interface  system  interaction  in  these  systems  is  a 
situation  that  naturally  reflects  a  great  deal  of  cognitive 
inconsistency.  As  a  result,  creating  an  accurate  mental  model  may  be 
an  essential  ingredient  for  the  successful  transfer  of  military  expert 
systems  to  operational  use. 

Regarding  the  completeness  of  the  above  research,  it  should  be 
recognized  that  these  experiments  operationalized  cognitive 
consistency  as  the  match  between  the  user's  and  the  expert  system's 
procedures.  Other  dimensions  of  cognitive  consistency  need  to  be 
examined.  Furthermore,  a  node  description  command  was  the  only  type 
of  explanation  a  user  could  receive  in  this  study.  This  was  chosen 
primarily  because  of  the  imposed  time  constraint  and  the  nature  of  the 
task  setting.  Other  explanation  capabilities  should  be  examined. 


mental  model  into 


several , 
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for  Experiment  4  that  decomposed 
independently  manipulatible  parts.  However,  limited  subject 
availability  made  it  impossible  to  have  more  than  one  good  mental 
model  condition.  In  defense  of  our  unitary  mental  model  manipulation, 
however,  it  should  also  be  noted  that  despite  a  considerable  amount  of 
interest  in  the  concept  of  mental  and  cognitive  models,  empirical 
research  has  not  demonstrated  the  generality  of  the  impact  of  the 
mental  model  on  user/machine  interaction  (Rouse,  1985).  As  a  result, 
we  feel  that  the  key  contribution  of  the  research  discussed  above  has 
been  to  empirically  establish  'mental  model'  as  a  key  driver  in  the 
specific  context  of  user/expert  system  interaction. 
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